Investigation of molecular regulation mechanism under the pathophysiology of subarachnoid hemorrhage

Abstract This study aimed to investigate the molecular mechanism under the pathophysiology of subarachnoid hemorrhage (SAH) and identify the potential biomarkers for predicting the risk of SAH. Differentially expressed mRNAs (DEGs), microRNAs, and lncRNAs were screened. Protein–protein interaction (PPI), drug–gene, and competing endogenous RNA (ceRNA) networks were constructed to determine candidate RNAs. The optimized RNAs signature was established using least absolute shrinkage and selection operator and recursive feature elimination algorithms. A total of 124 SAH-related DEGs were identified, and were enriched in inflammatory response, TNF signaling pathway, and others. PPI network revealed 118 hub genes such as TNF, MMP9, and TLR4. Drug–gene network revealed that chrysin targeted more genes, such as TNF and MMP9. JMJD1C-AS-hsa-miR-204-HDAC4/SIRT1 and LINC01144-hsa-miR-128-ADRB2/TGFBR3 regulatory axes were found from ceRNA network. From these networks, 125 candidate RNAs were obtained. Of which, an optimal 38 RNAs signatures (2 lncRNAs, 1 miRNA, and 35 genes) were identified to construct a Support Vector Machine classifier. The predictive value of 38 biomarkers had an AUC of 0.990. Similar predictive performance was found in external validation dataset (AUC of 0.845). Our findings provided the potential for 38 RNAs to serve as biomarkers for predicting the risk of SAH. However, their application values should be further validated in clinical.


Introduction
Intracranial aneurysm (IA) is one of the common neurological diseases, and its incidence rate in the general population is approximately 5% [1]. IA is characterized by localized dilation or ballooning of a cerebral artery. Once an IA ruptures, a subarachnoid hemorrhage (SAH) typically develops [2,3]. SAH is a severe subtype of stroke, occurring in people about 50 years old [4]. Previous research revealed that environmental exposures and genetic predisposition play a role in the susceptibility of SAH, and the estimated heritability is about 40% [5]. Recently, despite considerable advances in therapy for IAs, SAH remains a highly challenging condition associated with a high socioeconomic burden [6,7]. SAH is a critical disease that has to be treated immediately. Therefore, an in-depth understanding of the molecular mechanism of SAH is necessary for the treatment of SAH. In addition, early screening and early active management and prevention of SAH help to reduce the mortality and disability rate of SAH patients. For these two purposes, this study was designed to investigate the molecular mechanism under the pathophysiology of SAH and to identify potential biomarkers that could predict the risk of SAH.
With the development of bioinformatics, gene expression profiling has been widely used to identify the biomarkers for the diagnosis and treatment of SAH [8]. Wang et al. found that six hub genes, BASP1, CEBPB, ECHDC2, GZMK, KLHL3, and SLC2A3, were determined as biomarkers to assess the progression and rupture of IAs [3]. It is known that long non-coding RNAs (lncRNAs) interact with mRNAs, and microRNAs (miRNAs) regulate many processes, such as transcription, translation, regulation of cell differentiation and cell cycle [9]. Interestingly, noncoding RNAs, comprising miRNAs and lncRNAs, play an important role in IAs and SAH [10]. Besides, lncRNAs detected from the biological fluids may be used as noninvasive biomarkers for the diagnosis and prognosis of IAs and SAH [11]. For instance, lncRNA MALAT1 expression was independently associated with the poor overall survival for IAs, and the overexpression of MALAT1 predicted an higher risk of death in IA patients [12]. Circulating miRNAs (such as miR-16 and miR-25) may be novel biological markers that are useful in assessing the likelihood of IA occurrence [13]. Unfortunately, because of poor understanding of the mechanisms of SAH, current diagnosis and treatment of SAH can be inconsistent and/or ineffective [14,15]. Especially, the effects of core RNAs on the progression and prognosis of SAH patients have not been fully identified.
In the present research, we aimed to screen the SAHrelated RNAs as biomarkers to provide new insights for the early screening, diagnosis, and treatment of SAH. For this aim, GSE36791 [16] and GSE73378 [15] datasets from the Gene Expression Omnibus (GEO) database were reanalyzed. A flowchart presenting the experimental design of this study is illustrated in Figure 1.

Data collection and preprocessing
The microarray datasets searched by terms of "subarachnoid hemorrhage" and "Homo sapiens" were acquired in the GEO database as of 2 January 2021. For the purpose of this research, the dataset screening criteria were as follows: (1) blood samples; (2) samples of SAH patients and controls; and (3) the total number of samples >50. There were two datasets meeting the screening criteria, GSE73378 [15] and GSE36791 [16] datasets. The GSE73378 dataset had a total of 226 samples, of which 210 blood samples including 103 SAH samples and 107 control samples were analyzed in this study. GSE36791 dataset had a total of 61 blood samples including 43 SAH samples and 18 control samples. The platform of these two datasets was GPL10558 Illumina HumanHT-12 V4.0 expression beadchip. The corresponding platform annotation files were downloaded from Ensembl genome browser 96 database. Then, the probes in the two datasets were annotated to lncRNAs, miRNAs, and mRNAs based on the annotation files. Of the two datasets, all the analyses were performed based on GSE73378 dataset, and GSE36791 was used just for validation of the expression and predictive performance of the selected feature RNAs.

SAH-related DEGs screening and functional enrichment analysis
SAH-associated genes were downloaded from DisGeNET database [21] by term of "subarachnoid hemorrhage." The SAH-associated genes from DisGeNET database were used as the reference gene sets, gene set enrichment analysis was performed for all genes detected in GSE73378 dataset (genes were ranked by corresponding log FC value) using GSEA software (http://software.broadinstitute.org/ gsea/index.jsp) [22] to further identify SAH-associated genes from GSE73378 dataset. Then, the obtained SAH disease-related genes were merged with DEGs, and the overlapped genes were selected as the SAH-related DEGs. The SAH-related DEGs were used to perform Gene Ontology (GO) enrichment analysis (biological process) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses with the DAVID online tool (version 6.8) [23,24]. FDR < 0.05 was considered to be significantly enriched.

Construction of protein-protein interaction (PPI) network
Interactions among the protein-coding genes in SAHrelated DEGs were retrieved from the STRING database (Version 11.0) [25] with PPI score of 0.4. PPI network was visualized using Cytoscape software (Version 3.6.1) [26] based on interaction pairs.

Construction of the drug-gene network
Connectivity Map (CMap) resource was created to connect human diseases with the genes that underlie them and drugs that treat them. CMap is the first installment of a reference collection of gene-expression profiles from cultured human cells treated with small bioactive molecules, for uncovering the functional connections among diseases, genetic perturbation, and drug action [27,28]. The Comparative Toxicogenomics Database (CTD) is a public resource based on published literature, manually curated associations among genes, chemicals, phenotypes, diseases, and environmental exposures [29]. To predict the small molecule drugs that target the SAHrelated DEGs, both CMap and CTD were used. First, SAH-related DEGs were searched from CMap database to obtain the drug molecule-gene interactions. Second, the SAH-related DEGs were uploaded to CTD database to obtain the drug molecule-gene interactions. Then, the overlapped drug molecule-gene interactions from the two databases were selected. Finally, the drug-gene network was visualized based on the selected drug molecule-gene interactions using Cytoscape 3.6.1 software.

Construction of competing endogenous RNA (CeRNA) network
The connection relationship between DELs and DEMs was constructed by the DIANA-LncBase v2 database [30], and the lncRNA-miRNA interactions with negative correlations of their expression level were selected. The DEMsassociated target genes (miRNA-mRNA) were predicted using five miRNA databases including TargetScan Ver-sion7.2 [31], picTar [32], miRanda [33], RNA22 [34], and PITA [35]. The miRNA-target gene interaction pairs were selected if they were predicted in more than three databases and were further filtered by SAH-related DEGs. Finally, the ceRNA network was established by integrating lncRNA-miRNA interactions and miRNA-mRNA interactions using Cytoscape 3.6.1 software.

Screening of optimal RNAs signature
All RNAs (mRNAs, miRNAs, and lncRNAs) contained in these three networks were used to screen characteristic RNAs by two different algorithms: least absolute shrinkage and selection operator (LASSO) and recursive feature elimination (RFE). In brief, R 3.6.1 lars package (Version 1.2, https://cran.r-project.org/web/packages/lars/index.html) [36] was used to perform the regression analysis to screen characteristic RNAs. The RFE algorithm in the R 3.6.1 caret package (Version 6.0-76, https://cran.r-project.org/web/ packages/caret) [37] was also used to screen the optimal characteristic RNAs. Then, we compared the results of the two algorithms and selected the overlapping RNAs as the final feature RNAs signature.

Evaluation and validation of optimal RNAs signature
We first extracted the expression of the optimal feature RNAs from GSE73378 dataset and GSE36791 dataset. Their expression levels in SAH and normal samples were displayed. Afterward, the Support Vector Machine (SVM) from R 3.6.1 e1071 (Version 1.6-8, https://cran.r-project. org/web/packages/e1071) [38] was used to construct the SVM classifier based on the optimal feature RNAs signature (Core: Sigmoid Kernel; Cross: 100-fold cross-validation). Both GSE73378 dataset and GSE36791 dataset were used for classifier construction. Receiver operating characteristic (ROC) curve analysis was performed with R 3.6.1 pROC (Version 1.12.1, https://cran.r-project.org/web/packages/pROC/ index.html) [39] to calculate the performance of the SVM classifier for SAH. The R codes used in this study have been provided in an additional file.

SAH-related DEGs identification and function enrichment
From the DisGeNET database, a total of 470 genes associated with SAH were obtained. Then, GSEA was performed for all genes in SAH with the reference gene sets of SAH-associated genes from DisGeNET, and a total of 354 SAH-related genes were obtained (Figure 3a). Next the 354 SAH-related genes were compared with 621 DEGs, and a total of 124 overlapping genes were obtained as SAH-related DEGs. Enrichment analysis showed that these genes were enriched in 106 GO-biological processes, such as GO:0071260cellular response to mechanical stimulus, GO:0045944positive regulation of transcription from RNA polymerase II promoter, GO:0048661positive regulation of smooth muscle cell proliferation, GO:0006954inflammatory response, and GO:0001666response to hypoxia ( Figure 3b). In addition, 43 KEGG pathways were significantly enriched for these genes, including hsa05200: pathways in cancer, hsa04668: TNF signaling pathway, hsa04010: MAPK signaling pathway, hsa04066: HIF-1 signaling pathway, and hsa04068: FoxO signaling pathway (Figure 3c).

Construction of PPI network
The SAH-related DEGs were entered into the STRING database, and a total of 830 PPI networks were generated. The PPI network, including 118 gene nodes, was constructed as shown in Figure 4a. The first ten hub genes, TNF, AKT1, TP53, MMP9, TLR4, STAT3, IL1B, TLR2, MYC, and CXCR4, were screened with the highest degree.

Construction of the drug-gene network
From CMAP database, a total of 23 chemicals were obtained to target SAH-related DEGs with the threshold of |Pearson R| > 0.75 and P < 0.05. The drug-gene pairs related to these 23 chemicals were further selected from the CTD database, and a total of 22 drug-gene pairs were obtained to establish a drug-gene network ( Figure A1, Table A1). The network contained ten upregulated genes, five downregulated genes, and five small molecule drugs (coralyne, alexidine, enilconazole, chrysin, and arachidonyltrifluoromethane). Chrysin was found to target more genes, such as TNF, AKT1, and MMP9.

CeRNA network construction
Using the DIANA-LncBase v2 database, seven lncRNA-miRNA interactions involving three miRNAs and five lncRNAs with the negative correlation of their expression levels were obtained. Then, the target genes were predicted for 3 miRNAs in lncRNA-miRNA interactions, and then the target genes were filtered by SAH-related DEGs, and a total of 21 pairs of miRNA-mRNA connections were found. The ceRNA network was established via integration with lncRNA-miRNA and miRNA-mRNA interactions (Figure 4b). The ceRNA network comprised 29 nodes, including 5 lncRNAs, 3 miRNAs, and 21 mRNAs. Notably, upregulated JMJD1C-AS1 may function as a ceRNA to suppress the inhibitory effects of hsa-miR-204 on HDAC4 and SIRT1, thus leading to their upregulated expression. Similarly, upregulated MEG3 may regulate the expression of TGFBR3 and GSK3B by binding to hsa-miR-128. In addition, LINC01144hsa-miR-128 -ADRB2/TGFBR3 regulatory axis was found. We further performed correlation analysis for lncRNA and their associated mRNAs in ceRNA network (Table A2), and weak positive correlations were found. There was a significant positive correlation between LINC01287 and STAT3 (r = 0.35; p < 0.01), indicating that LINC01287hsa-miR-204 -STAT3 was a potential important ceRNA regulatory axis.

Screening and verification of SAHrelated RNAs
LASSO and RFE algorithms were used to screen characteristic RNAs signatures from all RNAs in the three networks. In the training set (GSE73378), a total of 90 RNAs and 52 RNAs were obtained using LASSO and RFE, respectively (Figure 5a and b). Furthermore, a total of 38 overlapping RNAs were obtained as optimal characteristic RNAs signature, including 2 lncRNAs (JMJD1C-AS1 and LINC01144), 1 miRNA (hsa-miR-510), and 35 genes (TLR4, MMP9, ADRB2, TGFBR3, among others) ( Table 1). The expression levels of the optimal characteristic RNAs signature in SAH and normal samples are displayed in Figure 6a and b. Only the two lncRNAs, one miRNA, and top ten mRNAs (ranking by log FC) were displayed. In the GSE73378 dataset, all the 13 RNAs were significantly differentially expressed in the SAH sample compared to that of control samples (Figure 6a). While, in the GSE36791 dataset, the two lncRNAs (JMJD1C-AS1 and LINC01144), hsa-miR-510, and mRNAs (KLF4 and TRPM4) showed no statistical difference on their expression levels between SAH and normal samples (Figure 6b). To validate the diagnostic ability of the optimal characteristic RNAs signature, the SVM classifier were constructed in GSE73378 dataset, which showed well predictive value for SAH patients with an AUC of 0.990 (Figure 6c).
The predictive value of these optimal characteristic RNAs signature was further validated in an external independent dataset (GSE36791). The SVM classifier still showed better performance with an AUC of 0.845 (Figure 6d). The results showed that the RNAs had a robust and stable predictive ability for SAH.

Discussion
This study aimed to discover effective diagnosis biomarkers for SAH by the analysis of sequencing data, which have the potential to guide future clinical and basic medical studies. In the present research, we first identified 621 DEGs, of which 124 SAH-related DEGs were obtained using DisGeNET and GSEA. These genes were enriched in the inflammatory response, cellular response to mechanical stimulus, TNF signaling pathway, and cancer-related pathways. Increasing studies have revealed that IA is closely related to the inflammatory response [40,41]. Moreover, inflammation and immune response have also been found to potentially contribute to the formation of IA [42]. Among these pathways associated with SAH, some studies have also confirmed the role of the TNF signaling pathway in diseases including SAH. The potential of TNF-α inhibitors has been reported to impact the pathogenesis of aneurismal SAH, and the TNF-α signaling pathway has been found to play an important role in the pathogenesis of SAH [43]. In IAs, TNF-α was up-expressed in wall tissues and associated with the type and diameter of the aneurysm [44]. According to these studies, we speculated that the TNF signaling pathway was implicated in SAH development.
PPI network for SAH-related DEGs showed that TNF, MMP9, and TLR4 were hub genes. It has been reported that venous levels of TNF-R1 were associated with poor outcomes at 6 months for SAH [45], and down-regulating TNF-α can inhibit the formation of IAs in vivo [44]. Thus, decreasing TNF expression may have the potential to inhibit SAH. MMP9 was found to be associated with TLR4 signaling activation, and downregulating MMP9 induced by LPS has a neuroprotective effect on brain injury caused by SAH [46]. In addition, TLR4 is a key player in the regulation of inflammation, and it has been found to be correlated with poor prognosis in SAH [47]. Our present results also confirmed that TLR4 was up-expressed in SAH. Subsequently, by constructing a ceRNA network, it was suggested that the downregulated lncRNA MEG3 may be particularly important for SAH, as it may function as a ceRNA for upregulating hsa-miR-128 expression, thus leading to the downregulation of ADRB2 and TGFBR3. Previous studies have shown that MEG3 is highly expressed in SAH, and MEG3 may promote SAHinduced neuronal cell injury by inhibiting the PI3K/AKT signaling pathway [48]. However, MEG3 has also been found to promote platelet phagocytosis by decreasing miR-128 expression to protect VECs from senescence [49]. To the best of our knowledge, the regulatory mechanisms of MEG3 in SAH need further experimental confirmation. Furthermore, TGFBR3 is involved in the activation of the TGF-β signaling pathway, and TGFBR3 is downregulated in pancreatic ductal adenocarcinoma cells [50]. In present data, TGFBR3 was downregulated in SAH, and overexpression of TGRBR3 may be an important therapeutic target in SAH treatment.
We identified 38 optimal characteristic RNAs signatures from the RNAs in these networks, which were used to construct the SVM classifier. The results of ROC curves investigated that these RNAs (such as JMJD1C-AS1, LINC01144, hsa-miR-510, TLR4, ADRB2, TGFBR3, and so on) were potential biomarkers for predicting SAH. MiR-510 has been reported to be significantly downregulated in ovarian serous carcinoma (OSC), and it is a novel candidate biomarker for predicting the symptoms of OSC [51]. However, the role of miR-510 and lncRNAs, JMJD1C-AS1 and LINC01144, in SAH has not been reported. LINC01144-hsa-miR-128-ADRB2/TGFBR3 regulatory axis was found from our ceRNA network, LINC01144 may play a role in SAH by regulating ADRB2 and TGFBR3 expression. ADRB2 encodes adrenoceptor beta 2. Adrenoceptor polymorphisms are associated with an increased risk of cardiac abnormalities after SAH [52], β-adrenoceptor antagonists have been found to suppress the elevation of IL-6 after SAH in rats [53]. TGFBR3 is a transforming growth factor (TGF) beta receptor. TGF-β1/Smad/CTGF pathway was inhibited by rhDecorin to prevent development of hydrocephalus after SAH [54]. Knockdown of TGF-β1 in human umbilical cord-derived mesenchymal stem cells could attenuate SAH-induced chronic hydrocephalus, upregulation of inflammatory cytokines, and other behavioral changes [55]. Considering the important role of ADRB2 and TGFBR3 in SAH, we speculated that LINC01144 was involved in the development of SAH. However, elucidation of the roles of these lncRNAs associated with the screening and prevention of patients with SAH requires further investigation.
We suggested that these identified RNA biomarkers could help doctors to predict the risk of SAH and intervene as soon as possible. Although the feature RNAs were identified just based on the GSE73378 dataset, these feature RNAs still showed well predictive performance in another dataset with different inclusion criteria for patients (patients had the last episode of aneurysmal SAH at least two years in GSE73378 dataset, while this is not mentioned in GSE36791 dataset), which further indicated the stability and reliability of feature RNAs in predicting risk of SAH. Additionally, though the expression and predictive value of these feature RNAs have been validated using another external independent dataset (GSE36791), experimental verification is still indispensable in the future. In addition, the clinical value of these biomarkers should be further confirmed.

Conclusion
In summary, gene expression profile analysis revealed a large scale of expression pattern changes in RNAs under the pathophysiology of SAH, and they were mainly implicated in the inflammatory response, TNF signaling pathway. We further identified 38 RNAs, including 2 lncRNAs (JMJD1C-AS1 and LINC01144), 1 miRNA (hsa-miR-510), and 35 genes (TLR4, ADRB2, TGFBR3, among others) as potential blood biomarkers for screening patients with SAH. This 38 RNAs signature had a better predictive performance for SAH risk. LINC01144 might regulate ADRB2/TGFBR3 expression by sponging hsa-miR-128. These findings of the present study contributed to understanding the molecular mechanism of SAH deeply and also provided the potential biomarkers for the screening and prevention of SAH. However, their application values should be further validated in clinical.
Funding information: The authors state no funding involved.

Conflict of interest:
The authors state no conflict of interest. Data availability statement: The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. Figure A1: The drug-gene network. Circles and diamonds represent genes and chemicals, respectively. Red color represented the upregulated gene, and green represented a downregulated gene.